Sains Malaysiana 54(1)(2025): 279-290
http://doi.org/10.17576/jsm-2025-5401-22
Optimizing
Tuberculosis Treatment Predictions: A Comparative Study of XGBoost with Hyperparameter in Penang, Malaysia
(Mengoptimumkan Peramalan Rawatan Tuberkulosis: Suatu Kajian Perbandingan XGBoost dengan Hiperparameter di Penang,
Malaysia)
YANIZA
SHAIRA ZAKARIA1, NUR AFIQAH ARIFFIN2,*,
AZIZUL AHMAD3, RUSLAN RAINIS2, AIDY M. MUSLIM1 & WAN MOHD MUHIYUDDIN WAN IBRAHIM2
1Institute of Oceanography and Environment
(INOS), Universiti Malaysia Terengganu, 21030 Kuala Nerus, Terengganu, Malaysia
2Geography Section, School of Humanities, Universiti Sains Malaysia (USM), 11800 Pulau Pinang, Malaysia
3Centre for Spatially Integrated Digital Humanities (CSIDH), Faculty
of Social Sciences & Humanities, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak,
Malaysia
Received: 24
April 2024/Accepted: 4 November 2024
Abstract
The
bacterium Mycobacterium tuberculosis causes a viral infection affecting the
lungs and liver. Tuberculosis (TB) is a significant public health concern in
developing countries, where it is often associated with poverty, poor living
conditions, and limited access to healthcare services. According to the World
Health Organization (2023), Tuberculosis continues to pose a substantial risk
to public health on a global scale, with millions of people affected each year
and around 1.5 million deaths in 2020. Healthcare providers often encounter
significant challenges in addressing TB, leading to uncertain treatment outcomes.
This study introduces a novel method for enhancing TB treatment using
sophisticated machine learning techniques, particularly emphasizing the
application of XGBoost and various predictive models
in Penang State, Malaysia, to predict individual treatment outcomes based on
clinical data. The models were trained using 2017 Penang data. Comparing
predicted accuracy helps establish the optimum method. Clinical data was
anonymized and analyzed. Decision tree accuracy is
63.7% using 2017 data. Logistic Regression is 63.3% accurate, while XGBoost is 66.3%. Hyperparameter-tuned XGBoost performs best at 68.1%. Comparing observed and expected results determines
accuracy. TB result predictions are accurate using supervised learning.
Calibrated ensemble models like XGBoost makes
reliable predictions. Additional clinical characteristics may improve
forecasts. The primary objective was to develop a reliable, clinically
validated instrument that enhances TB treatments while optimizing resource
efficiency across diverse healthcare environments.
Keywords: Classification;
hyperparameter; logistic regression; prediction; random forest; tuberculosis
Abstrak
Bakteria Mycobacterium tuberculosis menyebabkan jangkitan virus yang menjejaskan paru-paru dan hati. Tuberkulosis (TB) adalah kebimbangan kesihatan awam yang signifikan di
negara-negara membangun dan sering dikaitkan dengan kemiskinan, keadaan hidup yang buruk dan akses terhad kepada perkhidmatan kesihatan. Menurut Pertubuhan Kesihatan Sedunia (2023), TB terus menimbulkan risiko yang besar kepada kesihatan awam di peringkat global dengan berjuta-juta orang terjejas setiap tahun dan sekitar 1.5 juta kematian pada tahun 2020. Penyediaan penjagaan kesihatan sering menghadapi cabaran besar dalam menangani TB, yang membawa kepada hasil rawatan yang tidak menentu. Kajian ini memperkenalkan kaedah baharu untuk meningkatkan rawatan TB menggunakan teknik pembelajaran mesin yang canggih dengan penekanan khusus kepada aplikasi XGBoost dan pelbagai model ramalan di Pulau Pinang, Malaysia untuk meramalkan hasil rawatan individu berdasarkan data klinikal.
Model-model tersebut dilatih menggunakan data Penang tahun 2017. Membandingkan ketepatan ramalan membantu menetapkan kaedah optimum. Data klinikal telah dianonimkan dan dianalisis. Ketepatan pokok keputusan adalah 63.7% menggunakan data 2017. Regresi Logistik adalah tepat 63.3%, manakala XGBoost adalah 66.3%. XGBoost yang diselaraskan dengan hiperparameter berprestasi terbaik pada 68.1%. Membandingkan hasil yang diperhatikan dan yang dijangkakan menentukan ketepatan. Ramalan keputusan TB adalah tepat menggunakan pembelajaran terawasi. Himpunan model yang dikalibrasi seperti XGBoost memberikan ramalan yang boleh dipercayai. Ciri klinikal tambahan mungkin dapat meningkatkan ramalan. Objektif utama adalah untuk membangunkan instrumen yang boleh dipercayai dan disahkan secara klinikal yang meningkatkan rawatan TB sambil mengoptimumkan kecekapan sumber pada pelbagai persekitaran penjagaan kesihatan.
Kata kunci: Hiperparameter; hutan rawak; pengelasan; ramalan; regresi logistik; Tuberkulosis
REFERENCES
Abdullahi, O.A., Ngari,
M.M., Sanga, D., Katana, G. & Willetts, A. 2019. Mortality during treatment
for tuberculosis; a review of surveillance data in a rural county in Kenya. PLoS ONE 14(7): e0219191.
https://doi.org/10.1371/journal.pone.0219191
Ahmad, A., Kelana,
M.H., Soda, R., Jubit, N., Mohd Ali, A.S., Bismelah, L.H. & Masron,
T. 2024a. Mapping the impact: Property crime trends in Kuching, Sarawak, during
and after the COVID-19 period (2020-2022). Indonesian Journal of Geography 56(1): 127-137. https://doi.org/10.22146/ijg.90057
Ahmad, A., Masron,
T., Jubit, N., Redzuan,
M.S., Soda, R., Bismelah, L.H. & Mohd Ali, A.S. 2024b. Analysis of the movement distribution
pattern of violence crime in Malaysia’s capital region-Selangor, Kuala Lumpur,
and Putrajaya. International Journal of Geoinformatics 20(2): 11-26.
https://doi.org/10.52939/ijg.v20i2.3061
Ahmad, A., Masron,
T., Junaini, S.N., Barawi,
M.H., Redzuan, M.S., Kimura, Y., Jubit,
N., Bismelah, L.H. & Mohd Ali, A.S. 2024c. Criminological insights: A comprehensive spatial analysis of
crime hot spots of property offenses in Malaysia’s urban centers. Forum Geografi: Indonesian Journal of Spatial and
Regional Analysis 38(1): 94-109. https://doi.org/10.23917/forgeo.v38i1.4306
Ahmad, A., Masron,
T., Junaini, S.N., Kimura, Y., Barawi,
M.H., Jubit, N., Redzuan,
M.S., Bismelah, L.H. & Mohd Ali, A.S. 2024d. Mapping the unseen: Dissecting property crime dynamics in
urban Malaysia through spatial analysis. Transactions in GIS 28(6):
1486-1509. https://doi.org/10.1111/tgis.13197
Ahmad, A., Masron,
T., Kimura, Y., Barawi, M.H., Jubit,
N., Junaini, S.N., Redzuan,
M.S., Mohd Ali, A.S. & Bismelah,
L.H. 2024e. Unveiling urban violence crime in the state of The Selangor, Kuala
Lumpur and Putrajaya: A spatial–temporal investigation of violence crime in
Malaysia’s key cities. Cogent Social Sciences 10(1): 2347411.
https://doi.org/10.1080/23311886.2024.2347411
Ahmad, A., Masron,
T., Mohd Ali, A.S., Barawi,
M.H., Nordin, Z.S., Abg Ahmad, A.I., Redzuan, M.S. & Bismelah,
L.H. 2024f. Exploring the potential of geographic information system (GIS)
application for understanding spatial distribution of violent crime related to
United Nations sustainable development goals-16 (SDGS-16). Journal of
Sustainability Science and Management 19(9): 35-63.
https://doi.org/10.46754/jssm.2024.09.003
Ahmad, A., Masron,
T., Mohd Ali, A.S., Kimura, Y. & Junaini, S.N. 2024g. Demographic dynamics and urban
property crime: A linear regression analysis in Kuala Lumpur and Putrajaya
(2015-2020). Planning Malaysia: Journal of the Malaysian Institute of
Planners 22(4): 302-319. https://doi.org/10.21837/pm.v22i33.1550
Ahmad, A., Masron,
T., Ringkai, E., Barawi,
M.H., Salleh, M.S., Jubit, N. & Redzuan, M.S. 2024h. Analisis ruangan hot spot jenayah pecah rumah di negeri Selangor,
Kuala Lumpur dan Putrajaya pada tahun 2015-2020. Geografia-Malaysian
Journal of Society and Space 20(1): 49-67.
https://doi.org/10.17576/geo-2024-2001-04
Ali, A., Alrubei,
M.A.T., Hassan, L.F.M., Al-Ja’afari, M.A.M. & Abdulwahed, S.H. 2020. Diabetes diagnosis based on KNN. IIUM
Engineering Journal 21(1): 175-181.
https://doi.org/10.31436/iiumej.v21i1.1206
Ariffin, N.A., Wan Ibrahim, W.M.M., Rainis, R., Samat, N., Mohd Nasir, M.I., Abdul Rashid, S.M.R., Ahmad, A. &
Zakaria, Y.S. 2024. Identification of trends, direction of distribution and
spatial pattern of tuberculosis disease (2015-2017) in Penang. Geografia-Malaysian
Journal of Society and Space 20(1): 68-84. https://doi.org/10.17576/geo-2024-2001-05
Bismelah, L.H., Masron,
T., Ahmad, A., Mohd Ali, A.S. & Echoh, D.U. 2024. Geospatial assessment of healthcare
distribution and population density in Sri Aman, Sarawak, Malaysia. Geografia-Malaysian
Journal of Society and Space 20(3): 51-67.
https://doi.org/10.17576/geo-2024-2003-04
Bukundi, E.M., Mhimbira,
F., Kishimba, R., Kondo, Z. & Moshiro,
C. 2021. Mortality and associated factors among adult patients on tuberculosis
treatment in Tanzania: A retrospective cohort study. Journal of Clinical
Tuberculosis and Other Mycobacterial Diseases 24: 100263.
https://doi.org/10.1016/j.jctube.2021.100263
Chabo, D., Masron, T., Jubit, N. & Ahmad, A. 2024. Analisis corak ruangan keciciran murid sekolah menengah di Sarawak. Malaysian Journal of Social
Sciences and Humanities 9(9): e002906.
https://doi.org/10.47405/mjssh.v9i9.2906
Dheda, K., Perumal, T., Moultrie, H., Perumal,
R., Esmail, A., Scott, A.J., Udwadia,
Z., Chang, K.C., Peter, J., Pooran, A., von Delft,
A., von Delft, D., Martinson, N., Loveday, M., Charalambous, S., Kachingwe, E., Jassat, W., Cohen,
C., Tempia, S., Fennelly, K. & Pai, M. 2022. The
intersecting pandemics of tuberculosis and COVID-19: Population-level and
patient-level impact, clinical presentation, and corrective interventions. The
Lancet Respiratory Medicine 10(6): 603-622.
https://doi.org/10.1016/S2213-2600(22)00092-3
Fayaz, S.A., Babu, L., Paridayal, L., Vasantha,
M., Paramasivam, P., Sundarakumar,
K. & Ponnuraja, C. 2024. Machine learning
algorithms to predict treatment success for patients with pulmonary
tuberculosis. PLoS ONE 19(10):
e0309151–e0309151. https://doi.org/10.1371/journal.pone.0309151
Gichuhi, H.W., Magumba,
M., Kumar, M. & Mayega, R.W. 2023. A Machine
Learning approach to explore individual risk factors for tuberculosis treatment
non-adherence in Mukono district. PLOS Glob Public Health 3(7):
e0001466. https://doi.org/10.1371/journal.pgph.0001466
Gill, C.M., Dolan, L., Piggott, L.M. &
McLaughlin, A.M. 2022. New Developments in Tuberculosis Diagnosis and
Treatment. Breathe, 18(1): 210149.
https://doi.org/10.1183/20734735.0149-2021
Hrizi, O., Gasmi, K.,
Ben Ltaifa, I., Alshammari,
H., Karamti, H., Krichen,
M., Ben Ammar, L. & Mahmood, M.A. 2022. Tuberculosis disease diagnosis
based on an optimized Machine Learning model. Journal of Healthcare
Engineering 2022: 8950243. https://doi.org/10.1155/2022/8950243
Hussain, O. A., & Junejo,
K.N. 2018. Predicting treatment outcome of drug-susceptible tuberculosis
patients using machine-learning models. Informatics for Health and Social
Care 44(2): 135–151. https://doi.org/10.1080/17538157.2018.1433676
Janssens, R.J., Mourão-Miranda,
J. & Schnack, H.G. 2018. Making individual
prognoses in psychiatry using neuroimaging and Machine Learning. Biological Psychiatry:
Cognitive Neuroscience and Neuroimaging 3(9): 798-808.
https://doi.org/10.1016/j.bpsc.2018.04.004
Jubit, N., Masron, T.,
Ahmad, A. & Soda, R. 2024a. Investigating the spatial relation between landuse and property crime in Kuching, Sarawak through
location quotient analysis. Forum Geografi:
Indonesian Journal of Spatial and Regional Analysis 38(2): 153-166.
https://doi.org/10.23917/forgeo.v38i2.4575
Jubit, N., Masron, T., Redzuan, M.S., Ahmad, A. & Kimura, Y. 2024b.
Revealing adolescent drug trafficking and addiction: Exploring school
disciplinary and drug issues in the Federal Territory of Kuala Lumpur and
Selangor, Malaysia. International Journal of Geoinformatics 20(6): 1-12.
https://doi.org/10.52939/ijg.v20i6.3327
Jubit, N., Masron, T., Puyok, A. & Ahmad, A. 2023. Geographic
distribution of voter turnout, ethnic turnout and vote choices in Johor state
election. Geografia-Malaysian Journal of Society and Space 19(4): 64-76.
https://doi.org/10.17576/geo-2023-1904-05
Kouchaki, S., Yang, Y., Walker, T.M., Sarah Walker,
A., Wilson, D.J., Peto, T.E.A., Crook, D.W., CRyPTIC Consortium & Clifton, D.A. 2019. Application of
Machine Learning techniques to tuberculosis drug resistance analysis. Bioinformatics 35(13): 2276-2282. https://doi.org/10.1093/bioinformatics/bty949
Lopez-Garnier, S., Sheen, P. & Zimic, M. 2019. Automatic diagnostics of tuberculosis using
convolutional neural networks analysis of MODS digital images. PLoS ONE 14(2): e0212094.
https://doi.org/10.1371/journal.pone.0212094
Marzuki, A., Bagheri, M., Ahmad, A., Masron, T. & Akhir, M.F.
2024. Examining transformations in coastal city landscapes: Spatial patch
analysis of sustainable tourism - A case study in Pahang, Malaysia. Landscape
and Ecological Engineering 20: 513-545. https://doi.org/10.1007/s11355-024-00613-w
Marzuki, A., Bagheri, M., Ahmad, A., Masron, T. & Akhir, M.F.
2023. Establishing a GIS-SMCDA model of sustainable eco-tourism development in
Pahang, Malaysia. Episodes 46(3): 375-387.
https://doi.org/10.18814/epiiugs/2022/022037
Masron, T., Ahmad, A., Jubit,
N., Sulaiman, M.H., Rainis,
R., Redzuan, M.S., Junaini,
S.N., Jamian, M.A.H., Mohd Ali, A.S., Salleh, M.S., Zaini, F., Soda, R. &
Kimura, Y. 2024. Crime Map Book. Centre for Spatially Integrated Digital
Humanities (CSIDH), Faculty of Social Sciences and Humanities, Universiti Malaysia Sarawak.
https://www.researchgate.net/publication/384572873_Crime_Map_Book
Miotto, R., Li, L., Kidd, B.A. & Dudley, J.T.
2016. Deep patient: An unsupervised representation to predict patients’ future
from the electronic health records. Scientific Reports 6: 26094.
https://doi.org/10.1038/srep26094
Nicholson, T.J., Hoddinott,
G., Seddon, J.A., Claassens, M.M., van der Zalm, M.M., Lopez, E., Bock, P., Caldwell, J., Da Costa,
D., de Vaal, C., Dunbar, R., Du Preez, K., Hesseling, A.C., Joseph, K., Kriel,
E., Loveday, M., Marx, F.M., Meehan, S.A., Purchase, S., Naidoo, K., Naidoo,
L., Solomon-Da, C.F., Sloot, R., Osman, M. 2023b. A
systematic review of risk factors for mortality among tuberculosis patients in
South Africa. A Systematic Review 12(1): 23.
https://doi.org/10.1186/s13643-023-02175-8
Pedregosa, F., Varoquaux,
G., Gramfort, A., Michel, V., Thirion,
B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss,
R., Dubourg, V., Vanderplas,
J., Cournapeau, D., Brucher,
M., Perrrot, M. & Duchesnay,
E. 2011. Scikit-learn: Machine Learning
in Python. Journal of Machine Learning Research 12: 2825-2830.
https://dl.acm.org/doi/10.5555/1953048.2078195
Takarinda, K.C., Sandy, C., Masuka,
N., Hazangwe, P., Choto, R.C.,
Mutasa-Apollo, T., Nkomo, B., Sibanda, E., Mugurungi,
O., Harries, A.D. & Siziba, N. 2017. Factors
associated with mortality among patients on TB treatment in the Southern Region
of Zimbabwe, 2013. Tuberculosis Research and Treatment 2017: 6232071. https://doi.org/10.1155/2017/6232071
Tiwari, A. & Maji, S. 2019. Machine
Learning techniques for tuberculosis prediction. International Conference on
Advances in Engineering Science Management & Technology (ICAESMT) - 2019,
Uttaranchal University, Dehradun, India. https://ssrn.com/abstract=3404486
or http://dx.doi.org/10.2139/ssrn.3404486
World Health Organization. 2023. Tuberculosis.
World Health Organization.
https://www.who.int/news-room/fact-sheets/detail/tuberculosis
World Health Organisation. 2022. Global
Tuberculosis Report 2022.
https://www.who.int/teams/global-tuberculosis-programme/tb-reports/global-tuberculosis-report-2022
Xie, Y., Han, J., Yu, W., Wu, J., Li, X. &
Chen, H. 2020. Survival analysis of risk factors for mortality in a cohort of
patients with tuberculosis. Canadian Respiratory Journal 2020: 1654653.
https://doi.org/10.1155/2020/1654653
Xiong, Y., Ba, X., Hou, A., Zhang, K., Chen, L.
& Li, T. 2018. Automatic detection of mycobacterium tuberculosis using
artificial intelligence. Journal of Thoracic Disease 10(3): 1936–1940.
https://doi.org/10.21037/jtd.2018.01.91
Yang, S., Zhu, F., Ling, X., Liu, Q. &
Zhao, P. 2021. Intelligent health care: Applications of deep learning in
computational medicine. Frontiers in Genetics https://doi.org/10.3389/fgene.2021.607471
Zakaria, Y.S., Ahmad, A., Said, M.Z., Epa, A.E., Ariffin, N.A., M
Muslim, A., Akhir, M.F. & Hussin,
R. 2023. GIS and oil spill tracking model in forecasting potential oil
spill-affected areas along Terengganu and Pahang coastal area. Planning
Malaysia: Journal of the Malaysian Institute of Planners 21(4): 250-264.
https://doi.org/10.21837/pm.v21i28.1330
*Corresponding author; email:
zarika27@gmail.com